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We have used the polymerase chain reaction and V H family-based primers to .clone and 
sequence 74 human germline V H segments from a single individual and built a directory to - ,, 
include all known germline sequences. The directory contains 122 V H segments with . 
different nucleotide sequences, 83 of which have open reading frames. The directory 
indicates that the structural diversity of the germline repertoire for antigen binding is fixed * 
by about 50 groups of V H segments: each group encodes identical hypervariable loops. The/ 
directory should help in mapping the V H locus, in estimating' somatic mutation and V H '' 
segment usage and in designing and constructing synthetic antibody libraries. > • . ' t >■ 
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1. Introduction 

Antibody architecture accommodates a wealth of 
structural diversity. Heavy and light chain variable 
domains (V H and V L ) each consist of a j?-sheet scaf- 
fold, surmounted by three antigen -binding loops 
(complementarity -determining regions, or CDRsJ; 
Kabut & Wu, 1971) of different lengths which are 
fleshed with a variety of different side-chains. The 
structural diversity of the loops can create binding 
sites of a variety of shapes, ranging from almost flat 
surfaces (Amit et al. t 1986) to deep cavities (Alzari et 
ah, 1990). Underpinning the structural diversity is a 
combinatorial genetic diversity. For V H domains, it 
is generated by the assembly of V H , D (diversity) 
and J H (joining) segments. Two of the CDRs (1 and 
2) are encoded by the V H segment, and CDR3 by the 
3' end of the V H segment, the D segment and the 5' 
end of the J H segment. With nucleotide addition 
(N-region diversity at the f V H -D and D-J H joins), the 
use of different reading frames in the D segment, 
and the combination of different rearranged heavy 
and light chains, the diversity of primary antibody 
libraries is huge (for reviews, see Tonegawa, 1983; 
Winter & Milstein, 1991). During an immune 
response, the antibody variable regions are further 
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diversified by somatic hypermutation,' leading to 
higher affinity binding of the antigen (Berek & 

Milstein,. 1988). . ' ]' 

The human V H , D and J H segments have been 
mapped to band q32.33 of ctoomosome 14 (Croce et 
al> 1979; Kirsch et al. y 1982), and recombine during 
B cell development. Each V H segment encodes a 5' 
hydrophobic leader peptide and betweeh 95 and 101 
amino acid residues of the mature domain flanked at 
the 3' end by two recombination signals consisting 
of a highly conserved heptamer (5'-CACAGTG-3'), a 
23-base-pair spacer and a t less-conserved noriamer. 
The V H segments have evolved by unequal crossing- 
over, conversion, duplication and deletion (Wysocki 
& Gefter, 1989; Walter et al, 1990) arid can be 
divided into six families on the basis of nucleotide 
homology of 80% or above (Kodaira et al., 1986; 
Lee et aL, 1987; Shen et al, 1987; Bermari et aL, 
1988; Humphries et at., 1988; Buluwela & Rabbitts, 
1988). The number of V H segments per individual 
has most recently been estimated as, about 76 (25 
V H 1 segments, 5 V H 2 segments, 28 V H 3 segments, 14 
V H 4 segments, 3 V H 5 segments and 1V H 6 segment; 
Walter et ah, 1990), although these figures are likely 
to be an underestimate (Berman et ah , 1988; Walter 
et al, 1990). , . ! 

Earlier attempts to clone human V H segments 
have involved constructing and probing ., large 
cosmid libraries, and have been aimed at mapping 
and sequencing the whole V H locus, including pseu- 
dogenes (Kodaira et ah, 1986; Lee et a/., .1987; 
Berman et aL, 1988). In contrast, we set out to 
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Table 1 

. Family-specific primers for PCR amplification of the V H exon 



VHl primers 

VH1 LEA EX1 
VHl LEA EX2 
VHl LEA IN 
VHl LEA EX3 
VHl FR1 (2-8) 
VHl FR1 (17-22) 
VHl HEPT 

VH2 primers 

VH2LEA 
VH2 HEPT 

VH3 primers ' 

VH3LEA1 
VH3 LEA2 r 
VH3 LEA3 
VH3HEPT 
VH3 FR1 
VH3 FR3 
VH3N0N1 . 

VH4 primers 

VH4LEA 
VH4 HEPT 

VH5 primers 
VH5 LEA 
VH5 HEPT 

VH6 primers 

VH6LEA 
VH6 HEPT 



5'-CCC AAG CTT CCA TGG ACT GGA CCT GGA G-3' 
5'-CCC AAG CTT TCA TGG GCT GGA CCT GCA A-3' 

5'-CCC AAG CTT G(A,G)A (A;G)G(A,G) GAT T(G,T) (A.G.T) (G,T)TC CAG T-3' 

5'-CCC AAG CTT (T,C) (C,T) (C,T) (A,G)CA G(G,A) (T,C,A) (G,A) (C,T) (C,T,G) (C,T)A(C.T,G) <T,G)C-3' 
5'-CCC AAG CTT (C,G,T)CA(G,A) <C,T)T(A,G,T) (G,T)T(G,A) (C,T)A(G,A) (T,C)C(T,G) G-3' 
-5'-CCC AAG CTT (T,A)C(A,G) G(T,C)G A(A,G) (G,A) <G,A)T(C,T) (T,A)CC TGC-3' 
5'-GGA ATT CT(C,G) TGG (G,T)TT (C,T)TC ACA CTG TG-3' 



5'-CCC AAG CTT CTT CTC CAC AGG GGT CTT ATC-3' 
5'-GGA ATT CCA CTG TG(C.T) (C,G)CC GCG CAC A-3' 



5'-CCC AAG CTT T(A,T)(C,T) (A,G)TG TGG CA(A,G,C,T) TTT CTG A-3' 

5'-CCC AAG CTT T(A,T) (C,T) (A,G)T(C,G) TG(A,G) (A,C)A(A,G,C,T) TTT CTG A-3' 

5'-CCC AAG CTT GT(A,T) TGC A(A,G)G TG(C,T) CCA GTG T-3' 

5'-GGA ATT C(A,C)T G(A,G)C (C,T)TC CCC TC(A,G) CT(C.G) TG-3' 

5'-CCC CCA AGC TTT GT(G,C) CAG (G,C)CT CTG G(A,G)T TC-3' 

5'-GCT CTA GAG T(G,A)A (G,A)TC (T,G)GC C(T,C)T TCA C(A,G)G-3' 

5'-GCT CTA GAG GTT TGT G(T,C)C (T,C)GG GC(G,T) CA-3' 



5'-CCC AAG CTT CTG TTC ACA GGG GTC CTG TC-3' 
5'-GGA ATT CAC TCA CCT CCC CTC ACT GTG-3' 



5'-CCC AAG CTT AGG TCA CAG AG(A,G) AGA A(C,T)G G-3' 
5'-GGA ATT CGC TGG TTT CTC TCA CTG TG-3' 



5'-CCC AAG CTT TCA CAG CAG CAT TCA CAG A-3' 
5'-GGA ATT CCT GAC TTC CCC TCA CTG TG-3' 



determine the repertoire of human V H segments that 
contribute to the, structural diversity of the V H 
domain. We employed the polymerase chain reac- 
tion (PCR) (Saiki et al, 1988) as a method of ampli- 
fying individual V H segments: We designed family- 
specific primers for V H segments based on the 
heptamer and part of the recombination spacer at 
the 3' end of the V H exon, and regions of the leader 
exon or intron at the 5' end. Priming from the 
heptamer has been used to amplify mouse 
(Borghesi-Nicoietti & Schulze, 1991) and human 
(Sanz et aL, 1989c) V H segments and has the advan- 
tage that since the heptamer is lost during recom- 
bination,, rearranged V H genes are not amplified. 

2. Materials and Methods 

(a) Primer design 

Primers were designed (Table 1) for each of the 6 V H 
families based on the sequences of published V H segments 
(Kodaira et al.\ 1986; Lee et al, 1987; Herman et aL, 1988; 
Humphries et al, 1988) and were located as shown in 
Fig. 1(a). Forward primers were based around the highly 
conserved - heptamer recombination . sequence, 
o-CACAGTG-3'. For 5 V H families, published germline 
sequences were used, basing forward primers (VHl 
HEPT, VH3 HEPT, VH4 HEPT, VH5 HEPT, VH6 
HEPT) on the heptamer sequence and an additional 11 to 
.13 nucleotides from the recombination spacer. Degenerate 
nucleotides were incorporated to ensure the efficient 
priming of known germline genes from each V H family, 



and EcoEl restriction sites were added for cloning. As 
germline V H 2 sequences were not available, the forward 
primer (VH2 HEPT) was designed using the sequence of 
the third framework (FR) region from a rearranged V H 2 
gene, (Takahashi et al. t 1984) adding 2 degenerate 

bases to substitute for those outside FR3, and then 
adding the conserved heptamer sequence. Family-specific 
back primers (VHl LEA EX1, VHl LEA EX2, VHl 
LEA IN, VHl LEA EX3, VH2 LEA, VH3 LEA1, VH3 
LEA2, VH3 LEA3, VH4 LEA, VH5 LEA, VH6 LEA) 
were based on those parts of the leader exon and intron 
that are highly conserved within, but not between V H 
families, again incorporating degeneracy where necessary 
(VHl LEA EX1 and VHl LEA EX2 were mixed in equal 
ratios and are referred to as VHl LEA EX 1/2). The back 
primers, VHl FR1 (2-8) and VHl FR1 (17-22), were 
subsequently designed using the sequences obtained with 
the first set of PCR primers. Hindlll restriction sites were 
added to all back primers for cloning. 

"Internal" primers for the V H 3 family were designed 
based on those regions of framework 1 (VH3 FR1) and 
CDR2-framework 3 (VH3 FR3) that display the greatest 
homology within the V H 3 family (see Fig. 2(b)). Since 
EcoBl restriction sites were noted in 2 published V H 3 
pseudogenes (V 71 .j and V 71 . 3 ; Kodaira et aL, 1986) we 
changed the cloning site in the forward primer (VH3 FR3) 
to Xbal. 

(b) Preparation of genomic DNA 

Genomic DNA was isolated from peripheral white blood 
cells obtained from a healthy Caucasian donor, DP, using 
a method described by Perry & Carrell (1989). Briefly, 
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Figure 1. Family-specific primers for PCR amplification of the V H exon.. (a) Locations, of, .the : >f famny-^ased/.PjE!R 
primers with respect to the V H exon. FR, framework region; CDR, complementarity-determining region, :lBack ; primers 
were based in either the leader exon or intron or in framework 1 of the V H segment. Forward primers were based around 
the heptamer and nonamer and at the junction of the CDR2 and framework 3. -(b) PCR. amplified ^ 
DP run on a 15% agarose gel. M, 0X174 M T markers; lanes 1 to 12, amplifications, using the sets, of primers, depicted 
in (a). " , ...... ... ■ j[\ -.:k\ V, ^r-V/V- 



9 ml whole blood was collected in 1 ml 3*8 % (w/v) triso- 
dium citrate (anticoagulant). The cells were lysed by 
adding the mixture to 90 ml ice-cold cell lysis buffer 
(0-32 M-sucrose, 1% Triton X-100, 5 mM-MgCl 2 , 10mM- 
Tris-HCl (pH 7*5)) and left on ice for 15 min. The nuclear 



pellet was isolated by/cehtrifugatiiori kt':1000^ at i^C. for 
15 min and ■ then resuspended ;;in ^4*5 ml 1 \Tris/EDTA 
(10 mM-Tris • HG1 (pH 8-0)/ lmM-EDTA). The [pellet was 
i — ~j rnnUi** ■ lysis;' buffer (6*32 Mriithium 

mM-Tris - HCl (pH 8«0)f 1 hiM- 



lysed using 10 ml - nuclear 
acetate, 2% (w/v) SDS, 10 
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EDTA), extracted twice with phenol/chloroform, once 
with chloroform and precipitated using ice-cold ethanol. 
Samples were resuspended in 500 /d of water and quanti- 
fied by measuring their absorbance at 260 nm. 

(c) PCR amplification and sequencing 

Primers were synthesized on an Applied Biosystems 
(Foster City, CA) oligonucleotide synthesizer. Genomic 
DNA was amplified using the pairs of PCR primers 
(Fig. 1(a) and Table l),in a Techne programmable Dri- 
Block PHC-1 thermal cycler (Cambridge, UK) with either 
Promega : (Madison, WI) or Cetus (Perkin Elmer, 
Norwalk, <JT)Thermus aquations (Taq) DNA polymerase. 
Reaction, mixtures (50 p\) were .prepared containing 
25 pmol of each primer, 5 to 10 /*g of genomic DNA, 
2-5 units of Taq polymerase, 200 \m (each) dNTPs and 
the recommended buffer (Promega: 50 mM-KCl, 10 mM- 
Tris-HCl (pH8-8), l*5m M -MgCl 2 , 0-1% Triton X-100; 
Cetus: 50 mM-KCl, 10 mM-Tris *HC1 (pH8-3), 1-5 mM- 
MgCI 2 , 0-001% (w/v) gelatin). The reaction mixture was 
overlaid with paraffin oil and 30 cycles of amplification 
were performed. Each cycle consisted of denaturation 
(94 °C for 1 min), annealing (55 °C for 1 min) and extension 
(72 °C for, 2 min). At the end of 30 cycles, there was a final 
extension at 65 °C for 5 min. The product was analysed by 
running 5 /d on a 1*5% (w/v) agarose gel. The remainder 
was extracted with phenol/chloroform, precipitated with 
ethanol and digested with restriction enzymes Hindlll 
and EcdRI (or Xbal), A band of the expected size was cut 
from a 1*5% low melting point agarose gel and then 
purified by adsorption onto glassmilk using Geneclean II 
(Bio 101, La Jolla, CA) or by electrocution followed by 
precipitation with ethanol. 

The product was ligated into M13-K19 (Carter et al. t 
1985) that had been digested with HindlH and EcoRI (or 
Xbal). The ligation mix was used to transform E. coli 
BMH 71-18 cells (Gronenborn, 1976) by electroporation 
(Dower et al, 1988) using the Bio-Rad (Richmond, CA) 
Gene Pulser and plated on TYE plates (Miller, 1972). 
Single-stranded template from selected plaques was 
prepared and sequenced using the dideoxy chain termi- 
nation method (Sanger et al., 1977) and modified T7 DNA 
polymerase (Sequenase II; United States Biochemical 
Corp., Cleveland, Ohio). The sequence was read in one 
direction and compressions resolved using deoxyinosine 
triphosphate (Mills & Kramer, 1979). 
. Several , precautions were taken to avoid cross- 
contamination., PCR reaction mixes were subjected to 
high intensity, short-wave u.v. radiation (Amplirad, 
Genetic. Research Instrumentation, Dunmow, Essex, 
U.K.) for 5 min before adding genomic DNA to destroy 
any DNA contamination. Negative controls (no genomic 
DNA added) were always included in all amplifications to 
check for DNA contamination. Independent amplifica- 
tions with identical sets of primers were undertaken 
simultaneously to avoid clones isolated from one amplifi- 
cation contaminating the next. In all cases we imposed 
the requirement that each germline V H segment was seen 
in at least 2 independent amplifications. 

(d) Probing 

Oligonucleotide probes, 17 to 21 nucleotides in length 
(Table 2) were designed as described in Results, and 
synthesized as above. Phage plaques were picked onto 
duplicate TYE plates and grown as colonies for 30 h at 
37 °C. (Plaques that should hybridize to the probes were 
always included as positive controls.) The colonies were 



lifted onto Hybond nylon filters (Amersham Int., 
Amersham, U.K.), denatured in 5% (w/v) SDS, 2x SSC 
(300 mM-NaCl, 30 mM-trisodium citrate, pH 7-0) for 
2 min, baked in a microwave oven for 2*5 min and auto- 
crosslinked by short-wave u.v. (Stratalinker: Stratagene, 
La Jolla, CA) (Buluwela et al, 1989). Filters were pre- 
hybridized for 20 min at 42 °C in 15 ml hybridization 
solution (1 M-NaCl, 1 x Denhardt's (0*02% Ficoll, 0-02% 
polyvinylpyrrolidone, 0*02% bovine serum albumin), 
lOOmM-Tris-HCl (pH 7*5), 6-25 mM-EDTA, 1 mM-sodium 
pyrophosphate, 0-5% Nonidet P40, 0*006% rATP, 0*02% 
brewers' yeast tRNA) using a Techne HB-1 Hybridiser 
(Cambridge, U.K.). * 

For probing, 15 pmol of oligonucleotide were phos- 
phorylated with 30 /iCi [ 32 P]dATP for 30 min using 

2 units of polynucleotide kinase (New England Biolabs, 
Beverly, MA) in 30 fil 50 mM-Tris - HC1 (pH 7-5), 10 mn- 
MgCl 2 , 1 mM-dithiothreitol, and incorporation of 32 P 
checked by electrophoresis of the oligonucleotide on an 
18% (w/v) polyacrylamide gel. The probe was added to 
the hybridization solution, and the filters were hybridized 
at 42 °C for 2 h and then washed with 40 ml 6 x SSC (see 
above), 0-1% SDS, 0-1% sodium pyrophosphate at this 
temperature for 15 min and then for 20 min with 40 ml 

3 M-TMAC1 (tetramethylammonium chloride) in 50 mM- 
Tris- HC1 (pH 8*0), 0-1 % SDS and 2 mM-EDTA (Wood et 
al., 1985) at 59 °C (17-mer), 61 °C (18-mer), 63^ 0 C (19-mer) 
or 67°C (21-mer). Filters were dried and exposed to 
Kodak Fast Film overnight using an intensifying screen 
at -70°C. Filters were recycled by washing at 80 to 90 °C 
for 5 min in 2 x SSC and could be probed several times 
without loss of signal. 

(e) Compilation of germline and rearranged V H database 

DNA sequences were aligned and translated by a 
sequence analysis program (MacVector, IBI Kodak, New 
Haven, CT). In order to compile a comprehensive data- 
base of both human germline and rearranged V H 
sequences we searched MedLine (U.S. National Library of 
Medicine), GenBank (IntelliGenetics Inc., Real Mountain 
View, CA) and Kabat (Proteins of Immunological 
Interest, Kabat et al. t 1991) databases (for references, see 
Figs 2 and 3) and incorporated our own data. Rearranged 
genes were assigned to their closest germline counterparts 
by the presence of specific motifs in the protein sequence 
indicative of a particular V H segment or by maximum 
homology of the nucleotide sequences (using MacVector). 

3. Results 

(a) Strategy 

We designed family-specific PCR primers based 
on sequences from the literature and amplified, 
cloned and sequenced germline V H segments from 
our donor DP. Nucleotide sequences were aligned 
and taken as confirmed when seen as identical in 
two independent amplifications. Genes which 
remained unconfirmed in phase 1 were probed for 
with 32 P-labelled oligonucleotides and sequenced in 
phase 2. 

(b) Phase 1: PCR amplification and sequencing of 
random clones 

Genomic DNA was amplified using sets of family- 
based primers. The majority of primer combinations 
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Table 2 

Oligonucleotide probes used for identification of germline V H segments 



V H 1 family 

ft ■/ 




DP-1 


5'-AGT 


DP-1/8 


5'-TGT 


DP-9 


5'-CCA 




5'-ATT 


DP-4 

xVx *± 


5'-TGC 


DP -4 


5'-ACC 


DP-5 


5'-TGG 


TYP-fi 


5'-ACT 


DP-7/22 


5'-CAG 




5'-TCG 




5'-CCA 


TYP-Q/21 


5'-GGT 


DP- 10 


5'-TGC 


DP-11 


5'-AGG 




5'-ATC 




5'-ACA 


DP-14/22 

X-/X ~ idid 


5'-TGT 


xyx ~ i 


5'-AGT 


DP-16/17/20 


5'-TTG 


DP-18 


5-GAT 


DP-19 


5'-GAC 


DP- 19 


5'-GTT 


DP-19 


5'-TGC 


DP-19/23/25 


5'-GTG 


DP-21 


5'-CAA 


DP-22 


5'-CGG 


DP-23 


5'-GCA 


DP-24 


5'-CCC 


DP-1/7/8/10/14/ 


5'-CAC 



AAT 

GCC 

CTG 

GTT 

AGG 

ATT 

ATA 

GTG 

TGC 

TCA 

GCT 

TCC 

TGT 

TGT 

ACT 

TTG 

GTT 

TGA 

CCA 

CTG 

TAC 

CAT 

TCG 

TTA 

CTC 

CCA 

TAA 

AGG 

TGT 



ACG 

ACC 

CCA 

TCA 

TAG 

GAA 

ATT 

TAA 

ATA 

GAT 

GAT 

CAG 

ACC 

ATC 

AGG 

GGT 

ACC 

TAT 

GAG 

AAG 

ACC 

AAA 

AAG 

CCA 

AGA 

TGT 

AGT 

TTT 

GTC 



TGG 

ACT 

ACG 

CCA 

CGG 

AGG 

CAG 

AGT 

TAG 

CTC 

AGC 

TGT 

AAA 

CAC 

GCA 

TCA 

ATT 

CAT 

TAG 

ACA 

AGT 

GTA 

ATG 

TTG 

CCC 

CGT 

TGT 

CCT 

TCT 



CCG 

GTT 

ACG 

TCT 

TAG 

TGT 

TGA 

ATT 

TAG 

AGC 

ATA 

TGG 

GAT 

AAG 

CAC 

CCA 

GTA 

AAC 

CTC 

CGC 

TGG 

GTC 

TGT 

CCA 

AGA 

CAG 

TGG 

CAC 

CGC 



TG-3' 

AG-3' 

AT-3' 

TC-3' 

GT-3' 

GA-3' 

GG-3' 

TG-3' 

CT-3' 

CT-3' 

GC-3' 

TG-3' 

AG-3' 

TCT-3' 

CAA-3' 

GGG-3' 

AG-3' 

TG-3' 

CC-3' 

CG-3' 

AC-3' 

GG-3' 

CC-3' 

GC-3' 

TT-3' 

AT-3' 

TG-3' 

CT-3' 

AC-3' 



19/21/22/23/25 



V H 3 family 

DP-29 
DP-30 
DP-31 
DP-32 

DP-33 

DP-36 

DP-37 

DP-41 

DP-42 

DP-44/45 

DP-44/45/46/61 

DP-47 

DP-49/50 

DP-46/49 

DP-50 

DP-51 

DP-52 

DP-53 

DP-54 

DP-55/56 

DP-58 

DP-59 

DP-60 

DP-61 

V H 4 family 

v 58 

V 79 . 



5'-TTG 

5'-TTC 

5'-CAC 

5'-GTG 

5'-CAC 

5'-AGC 

5'-AGC 

5'-GTG 

5'-TAC 

o'-GTG 

5'-CAG 

5'-CCA 

5'-CAG 

5'-TCA 

5'-TCA 

5-CAG 

5'CAG 

5'-CCA 

5'-TTC 

5'-CCC 

5'AGT 

5'-CAG 

5'-GTA 

5'-ACC 



TTT 
TTA 
TAT 
CTA 
CAT 
TTT 
TTT 
CAT 
CAC 
CCA 
TGC 
CTA 
TGC 
TAT 
TAC 
TTC 
TGC 
TCA 
CAT 
CAT 
TCA 
TTC 
GCC 
CCC 



CTA 

TTA 

TCC 

CCA 

CCC 

GCT 

GCT 

GCC 

CGC 

CCA 

ATA 

CCA 

ATG 

GAT 

CAT 

ATG 

AGA 

CTA 

CTT 

TAG 

TTT 

ATG 

ATA 

ATT 



GTA 
AAC 
AAC 
CCA 
AAC 
TTT 
TTT 
ATA 
TAT 
CCA 
GCA 
CTA 
CCA 
ATA 
ATA 
CTA 
ACA 
TTA 
GCT 
GAT 
CAT 
TCA 
GCA 
ACT 



CGG 
CTA 
TAA 
TTC 
TAA 
AAT 
AAT 
GTT 
AAA 
GTA 
TAG 
ATA 
TAG 
ACT 
ACT 
TAG 
TAG 
ATA 
TTA 
TAA 
"AAC 
CTG 
CGC 
ACT 



CCA A-3' 
CCA A-3' 
TAC C-3' 
CAA T-3' 
TAA G-3' 
ACA G-3' 
ACG G-3' 
ACT G-3' 
TAA C-3' 
CCA A-3' 



CTA 
GCT 
CTA 
GCC 
GCC 
CTA 
CTA 
CGT 



C-3' 
G-3' 
C-3' 
A-3' 
A-3' 
C-3' 
C-3' 
G-3' 



TGT, T-3' 
CTT G-3' 
TA-3' . 
TTA C-3' 
ACT G-3' 
AAT A-3' 



5'-GCC CCA GTA GTA ACT ACT ACT-3' 
5' -GTA GTA ACC ACT GAC GGA C-3' 
5'*AGT TGG GGT TCC CAC TAT G-3' 
5'-GGT CCC CGG AGG CTT CAC C-3' 



Rearranged gene probes 

333, 1H1, etc. 5'-CAG TGT ATG GTG GAG TCA C-3' 
VDJ191 5'-AGT CAG GGC ATG ATT ATT A-3' 

39-1 5'-GCC CAC ACC CAC TCC ACT AGT-3' 

41-1 5'-GCC CAC ACC CCC TCC ACT AGT-3' 



produced good intensity PCR bands, as is shown in 
Figure Mfb), but amplifications using VH1 
EX3/VH1 HEPT and VH2 LEA/VH2 HEPT were 
variable and hence are not shown. Initially, 596 
random clones were sequenced (V H 1 family (170), 
V H 2 family (120), V H 3 family (150), V H 4 family (120), 
V H 5 family (24) and V H 6 family (12)). With one 
exception (one V H 5 gene found in a V H 1 library), the 
primers proved family-specific. This initial round of 
sequencing established 35 V H sequences (including 
pseudogenes) that were identical in at least two 
independent PCR amplifications (V H 1 family (12), 
V H 2 family (3), V H 3 family (8), V H 4 family (10), V H 5 
family (1) and V H 6 family (1)) and by this criterion 
correspond to germline segments. 

Many sequences were unconfirmed due to single 
nucleotide differences between clones from indepen- 
dent amplifications, presumably due to errors intro- 
duced by the Taq polymerase. The 61 single base 
changes seen per 100 sequences for the V H 1 and V H 3 
families correspond to 7 x 10 ~ 5 changes/nucleotide 
per cycle, which is consistent with the Taq poly- 
merase error rate suggested by Maruyama (1990). 

Other sequences, never confirmed in independent 
amplifications (but sometimes found in more than 
one clone from the same amplification), consisted of 
two parts, each of which could be aligned to 



different V H segments. As became clear on probing 
(see below), these sequences arose from partially 
extended fragments reannealing to a different 
segment after heat-denaturation. This phenomenon, 
termed "PCR cross-over", has also been seen in the 
detection of homologous recombinants (Frohman & 
Martin, 1990) and in the amplification of prepro- 
insulin cDNA (Shuldiner et al, 1989) and in this 
study accounted for 10% of all V H 1 and V H 3 clones 
sequenced. 

For the smaller V H families. (V H 2, V H 4, V H 5, V H 6), 
all sequences were confirmed in phase 1, or could be 
explained by PfJR artifacts. But many sequences 
from the V H 1 and V H 3 families remained uncon^ 
firmed, requiring systematic probing of a larger 
number of clones. . 

(c) Phase 2: probing and directed sequencing 

With the V H 1 primers, 42 different sequences 
(excluding obvious PCR errors caused by single base 
substitutions) were obtained in phase 1. Only 12 of 
these sequences were identical in at least two inde- 
pendent amplifications. Therefore, motif-specific 
probes were designed (Table 2) such that each probe 
would identify a group of different V H 1 clones with a 
particular sequence motif. Hence, when each clone 
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was probed in turn with each of the 29 probes, it 
could be distinguished by its "fingerprint", i.e. the 
set of sequence motifs that it contains. Thus, 1750 
clones from independent amplifications using the 
five V H 1 -based primer combinations (Fig. 1) were 
regridded and hybridized with the 29 probes. Clones 
that appeared to confirm a sequence from phase 1 
by "fingerprinting" were sequenced. In this way a 
further 11,V H 1 sequences were confirmed and only 
two new (pseudo)genes (DP-17, DP-20) were dis- 
covered. Nineteen of the original 42 sequences could 
not be confirmed by probing, but 18 of these could 
be attributed to "PCR cross-over".^ 

For the majority of unconfirmed sequences in the 
V H 3 family, we designed gene-specific probes (17- 
and 19-mers, Table 2), except in the case of DP-46/ 
DP-49, where three probes were necessary for iden- 
tification, and DP-44/45 and DP-56/57, where dis- 
crimination between the two in each pair was not 
possible. Probes were centred on the region of 
greatest heterogeneity within a CDR and therefore 
a single probe (with the above exceptions) could 
identify a single V H segment. Thus, 1100 clones 
taken from, independent amplifications with the 
three sets : of V H 3 leader/heptamer-based primers 
(Fig. 1) were hybridized in turn with the 21 probes 
and a further 22 V H 3 segments were confirmed by 
directed - sequencing. The remaining unconfirmed 
sequences could be attributed to PCR artifacts. 

We also designed "internal" VH3 primers (VH3 
FR1 and VH3 FR3) based on sequence data from 
phase 1 and phase 2. Genomic DNA from DP was 
amplified as before, and 48 randomly selected clones 
were sequenced and confirmed, when necessary, in 
two independent amplifications by probing and 
directed sequencing. Only seven new V H segments 
were obtained,, three of which appeared to be frag- 
mented pseudogenes with less than 60% homology 
to any known V H segment. Two sequences had been 
published before and have unusual heptamer 
sequences (DP-59/V H l 9 and DP-62/V 71 _ x , respec- 
tively), and the other two sequences were new 
(DP-60 and DP-61). 

To isolate full length versions of genes DP-59 to 
DP-61, which have open reading frames, we 
designed a primer (VH3 NON1) based on nonamer 
sequences of V H segments reviewed by Pascual & 
Capra (1991). Amplifications of genomic DNA were 
performed using VH3 LEA3 and VH3 NON1, and 
the resulting fragments were cloned, regridded and 
probed with oligonucleotides specific for DP-59, 
DP-60 and DP-61. DP-59 and DP-60 were isolated 
from independent PCR amplifications, and shown to 
have unusual heptamer sequences. A full length 
version of DP-61 was not found in this library. 

We also attempted to confirm additional germline 
V H segments reported in the literature and germline 
analogues of published rearranged genes. Using the 
V H family-specific primers (Table 1) to amplify and 
clone germline V H 2, V H 3 and V H 4 segments, we 
probed (Table 2) for the germline V H segments V 1X , 
V 58) V 79 and V 2ml (Lee et aL, 1987), rearranged V H 
genes 39-1, 41-1 (Deane & Norton, 1990), VDJ191 



(Mensink et aL, 1986) and 333, 1H1, 2C12, 2A12, 
1B11, 112, 115 and 126 (Cleary et aL, 1986) (rear- 
ranged genes were probed for at low stringency, i.e. 
TMAC1 wash at 50°C). None of these genes was 
identified in our libraries. 



(d) Sequence directory 

The 74 germline V H segments (25 V H 1 segments, 
3 V H 2 segments, 34 V H 3 segments, 10 V H 4 segments, 
1 V H 5 segment and 1 V H 6 segment) cloned and 
sequenced by us are prefixed "DP", the initials of 
our donor and are denoted by running numbers. Of 
these, 51 have open reading frames and 23 contain 
either frame shifts or stop codons and are therefore 
considered to be pseudogenes. We have also 
included sequences of germline V H segments 
published by others. The protein and nucleotide 
sequences of all 83 germline V H segments with open 
reading frames are given in Figure 2(a) and (b), 
respectively, and nucleotide sequences of the 39 
germline V H segments with interrupted reading 
frames (either frame shifts or stop codons) in Figure 
2(c). In Figure 2(b), the nucleotide sequences in 
each family have been aligned to a sequence with an 
open reading frame, 21-2 (V H 1 family), V„ 5 (V„2 
family), 12-2 (V H 3 family), V 71 _ 2 (V H 4 family), 
VH251 (V H 5 family), V H -VI (V H 6 family). The same 
sequences were used to align the pseudogenes in 
Figure 2(c). 

"fl-pl" is a V H segment described by Olee et aL 

(1991) , which was seen in amplifications of genomic 
DNA from two individuals, Fer and Pla. The V H 
segments hv3005b54, hv3019bl3, hv3019bl8 (Olee 
et aL, 1991), V H 4.12, V H 4.14, V H 4.15 (Sanz et aL, 
1989c) are genes amplified by PCR, but not 
confirmed either by probing, independent amplifica- 
tions, a rearranged sequence or by independent 
work. These sequences may be the result of PCR 
artifacts and have therefore been excluded from 
Figure 2. 

Within each family, protein sequences are 
arranged alphabetically by the amino acid residues 
(single letter code) of CDR1 and where these are 
identical by CDR2 (Fig. 2(a)). Sequences with 
minor framework differences, which could include 
allelic differences, are therefore adjacent. Sequences 
with identical encoded CDRs 1 and 2 are grouped 
with brackets (these also have identical HI and H2 
hypervariable loops, as defined by Chothia et at. 

(1992) , except in the case of 21-2/3-1/DP-7 and 
HG3; and V H 4.11/DP-71, V 71 _ 4 and V H 4.16). The 
canonical structure classes of HI (CDR1) and H2 
(CDR2) (Chothia & Lesk, 1987; Chothia et aL, 1989, 
1992) are shown, and those sequences that may be 
defective on structural grounds are marked with an 
X (see Chothia et aL, 1992). The canonical structure 
class of DP-61 is unknown. 

V H segments that have heptamers other than the 
conserved 5'-CACAGTG-3' motif are marked H. 
The nonamer is generally conserved within each 
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Fig. 2(a) continued 
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Fig. 2(b) continued 
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Fig. 2(b) continued 
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(b) 

Figure 3. Assignment of rearranged human V H genes to 
their closest germline counterparts, (a) Germline V H 
segments and the closest rearranged V H gene, references 
are (1) Bridges et al (1991)t; (2) Deane & Norton (1990)}§; 
(3) Kipps et al. (1989)§; (4) Manheimer-Lory et al. (1991)t; 
(5) Noma et al. (1984); (6) Pascual et al (1990)t; (7) Marks 
et al. (19916)}; (8) Silberstein et al. (1989)t; (9) Larrick et 
al. (1989a)r(10) Schroeder & Wang (1990)1; (H) Ermel et 
al. (1991)f; (12) Karr et al. (I99l)1f; (13) Brown et al. 
(1991)t; (14) Schroeder et al. (1987)1, (15) Geng et al 
(1991)f; (16) Marks et al. (1991a)}; (17) see Olee et al. 

(1991) 1[; (18) Timmers et al. (1991); (19) Bye et al. (1992)}; 
(20) Schutte et al. (1991)ft§; (21) Sanz et al. (1989a)f; (22) 
Desai et al. (1990)§; (23) Hughes-Jones et al. (1990). 
(b) Distribution of the number of amino acid differences 
between each rearranged V H gene (268 examples) and its 
closest germline counterpart. Data were taken from the 
above references and Kenten et al. (1982); Takahashi et al. 
(1984); Kudo et al. (1985); Mensink et al. (1986); 
Dersimonian et al. (1987)f; Shen et al. (1987)§; Berman et 
al (1988); Meeker et al (1988)§; Newkirk et al (1988); 
Cairns et al (1989)t; Carroll et al. (1989); Chen et al 
(1989)§; Dersimonian et al (1989)f; Gillies et al (1989); 
Kishimoto et al. (1989); Larrick et al (19896); Logtenberg 
et al (1989)Kt; Nakatani et al (1989); Nickerson et al 

(1989) f ; Sanz et al (19896)f; Yasui et al (1989); Akahori 
et al (1990); Felgenhauer et al (1990); Friedlander et al 

(1990) ; Guillaume et al (1990)Ht; Robbins et al (1990)t; 
Roudier et al (1990)t§; Siminovitch & Chen (1990)t; 
Spatz et al (1990)§; van der Heijden et al (1990); White et 
al. (1990); Andris et al (1990); Ezaki et al. (1991)f; 
Friedman et al (1991)t; Kuppers et al (1991)§; Mortari et 
al. (1991); Pascual et al (1991); Rioux et al. (1991)t; 
Silberstein et al (1991); van Es et al (1991)t; Mierau et al 

(1992) . Some of the references include sequences from 



family: ;•; r ■ \ : ■ " 

(V H 1, 5'-TCAGAAACC-3'; 
V H 2, 5'-ACAAAAACC-3'; 
V H 3, 5'-ACACAAACC-3'; < 
V H 4, 5'-ACAAAAACC-3' or 

5'-ACACAAACC-3'; . 
V H 5, 5'-TCTAAAACC-3'; , 
V H 6, 5'-ACACAAACC-3'). ; ;; : ' ; 

Where the nonamer sequence differs from the family 
consensus the V H segment is marked N. .* \ 

We compiled a database : of 292 rearranged (but 
not necessarily functional) ~.Vh genes and assigned 
268 of these, from 64 different sources (see legend to 
Fig. 3), to their closest germline counterparts.; In 
Figure 3(a) we list the V H segments, each with an 
example of a rearranged V H gene with the smallest 
number of amino acid differences. These^ data are 
summarized in Figure 2(a), with sequences marked 
R having rearranged counterparts; .with; trie 
indicated number of amino acid differences. The 
distribution of the number of amino acid differences 
across all 268 assigned rearranged genes is shown in 
Figure 3(b): 215 of the 292 rearranged V H -genes in 
our database have germline counterparts, seen in DP 
(data not shown). " '* ; . \ * { 

We were unable to assign 24 rearranged genes 
from the V H 3 (VDJ191, Mensink et aJ. : (1986); X51, 
X61, X71, Timmers et al. (1991); K6H6, K4B8, 
K5B8, K5G5, K6F5, K5C7, Kon et aZ. (1987); 333, 
1H1, 2C12, 2A12, 1B11, 112, 115 and 126, Cleary et 
al. (1986)) and V H 4 (TS2, Shen et aZ.;(1987); HIVB, 
Andris et al (1991); C6B2, Hoch & Schwabef (1987); 
2A4, Davidson et al (1990); 12-3, 30-2, Deane & 
Norton (1990)) families. Of these, 12-3 (Deane & 
Norton, 1990) is almost certainly the result of a 
PCR cross-over and the others appear to be derived 
from a possible four to six unknown germline V H 
segments. f . ' . . > 

(e) Germline sequence variability, ; 

Based on data from Figure 2(a), we have 
constructed variability plots, shown in Figure 4, for 
germline V H segments with open reading frames 
from all six families, as well" as. separate plots for the 
V H 1 and V H 3 families. We only excluded -those 
sequences marked X which may be defective on 
structural grounds (see above). At each position, a 
variability score was calculated as the number of 
different amino acids at that position, divided ;by 
the percentage frequency of occurrence of the; most 
common amino acid (see Kabat et al, 1991). ■ ; 

- 4. Discussion : ? j -}■ • 

(a) Cloning and sequencing strategy ' : : 

Our strategy for sequencing V H segments by PCR 
amplification of genomic DNA is based on the use of 

several different rearranged V H genes: all of the sequences 
(except where the genes could not be^ assignee!, see 
Results) have been used. For key to annotation of refer- 
ences (f, }, § and If) see Discussion. 
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Figure 4. Variability plot for germline V H segments. Variability was calculated (see Results) across protein sequences 
shown in Fig. 2(a), but excluding those that are likely to be defective on structural grounds (marked X). Plots were 
produced for the V H 1 family, V H 3 family and across all 6 families. 



family-specific primers designed from the sequences 
of the six known V H families. We were able to assign 
most of the rearranged V H genes to germline V H 
segments in Figure 2' with few differences in amino 
acid' sequences (Fig. 3(b)), but may have missed V H 
segments that are significantly different in the 
primer regions; for example, we did not find the 
germline counterparts of the rearranged genes deter- 
mined by Cleary *et al. (1986). Indeed, they have 



been classified as belonging to a new family (V H 7) by 
some authors (Schroeder et al., 1990), but they 
might also be highly mutated genes derived from a 
known germline V H segment (especially as they were 
derived from B-cell lymphomas). 

Since our aim was to determine the structural 
repertoire of human V H segments, the majority of 
primers were designed to amplify genes with 
"functional" hep tamer recombination sequences 
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(5'-CACAGTG-3'). We have therefore missed some 
genes with different heptamers, which presumably 
includes some pseudogenes. For example, three 
sequences which were amplified with internal V H 3 
primers and have unusual heptamer sequences, 
DP-59/V H 19, DP-60 and DP-62/V 7? . lf were not 
amplified using the heptamer primers. It is, 
however, unclear what constitutes a functional 
heptamer; indeed, in a recent study, Shin et al. 
(1991) discovered two V H 2 segments with an 
unusual heptamer sequence (5'-CACAAAG-3'). One 
of these segments has been seen as a rearranged 
gene (see Fig. 3(a)). This suggests that the 
5'-CACAGTG-3' heptamer sequence is not the only 
one used for recombination and, consequently, that 
the D segment heptamer may also be degenerate. 
This, and the fact that these V H 2 segments (V n _ 5 ) 
have an additional amino acid residue in framework 
3, may explain the poor performance of our V H 2 
primers and the relatively low number of V H 2 
segments isolated here. 

In addition, those genes with open reading frames 
(Fig. 2(a) and (b)) may be non-functional for other 
reasons. For example, the V H 1 segments 1-1 
(Berman et al., 1988) and V 71 _ 5 (Kodaira et al., 1986) 
have single base differences in the recombination 
nonamer and the leader intron splice site, respec- 
tively, and 1-v (Berman et al., 1988) has a frame 
shift in the leader exon. Certain V H segments may 
also be defective on structural grounds (marked X 
in Fig. 2(a), see Chothia et al, 1992). 

To avoid polymerase copying errors, we screened 
more than 2000 clones using motif- or gene-specific 
oligonucleotide probes to ensure identical nucleotide 
sequences from two independent amplifications. 
Copying errors fell into two categories: base substi- 
tutions and PCR cross-overs. Substitutions might 
have been reciuced by using a polymerase with a 5' 
to 3' proof-reading activity such as Vent (New 
England Biolabs, Beverly, MA) or Pfu (Stratagene, 
La Jolla, CA) DNA polymerases. However, under a 
range of conditions, these polymerases performed 
poorly (data not shown). PCR cross-overs occurred 
within the region of greatest homology, and were 
most easily detected by unexpected combinations of 
CDR1 and CDR2 due to a cross-over in framework 
2. This emphasizes the importance of confirmation 
from independent amplifications rather than from 
multiple clones of the same PCR; indeed, germline 
V H segments hv3005b54, hv3019bl3, hv3019bl8 
(Olee et al, 1991) and V H 4.I2, V H 4.14, V H 4.15 (Sanz 
et al., 1989c) may be the result of PCR artifacts (see 
above). 

(b) Polymorphism 

In our directory (Fig. 2), which contains data 
from many individuals, we have a total of 122 V H 
segments with different nucleotide sequences (41 
V H 1 segments, 5 V H 2 segments, 46 V H 3 segments, 22 
V H 4 segments, 7 V H 5 segments and 1 V H 6 segment), 
including 83 V H segments with open reading frames 
and 39 pseudogenes. However, we cannot exclude 



polymorphism and allelic variation or. distinguish 
between identical V H genes at different loci (possibly 
the result of a recent duplication). 

Southern blot analyses of restriction digests of 
DNA using cDNA probes (van Dijk et al, 1991), 
germline coding and flanking region probes 
(Souroujon et ah, 1989) or short sequence-specific 
probes (Sanz et ah, 1989c; Sasso et 'al;, 1990; 
van Dijk et al., 1991) have demonstrated restriction 
fragment length polymorphisms (RFLPs) in the 
V H 3 } V H 4 and V H 5 families. Some insertion/deletion 
polymorphisms have also been characterized and 
shown to involve, for example, at least one V H 2, one 
V H 3 and one V H 5 gene (Chen & Yang, 1990; Walter 
et al, 1990), and one V H 1 gene (Shin et al, 1991) 
Indeed, we failed to clone from DP several V H 
segments reported in the literature, despite using 
suitable PCR primers and probes. Some of the V H 
segments not amplified from DP are also missing in 
other individuals. For example, of the V H 4 segments 
not amplified from our donor, one (V 58 ) seen in a 
Japanese study (Lee et al., 1987) was not found in 
an American study (Sanz et al., 1989c) and the 
absence of a second V H 5 segment, VH32 (see Sanz 
et al., 1989c), from our donor may be due to a 
deletion polymorphism affecting V H 5 genes in 50% 
of individuals (Sam et al., 1988). ... 

In our directory, we found that the nucleotide 
sequences of 23 V H segments from DP with open 
reading frames were identical to those from 
unrelated individuals. We found other V H segments 
with a few nucleotide differences but with identical 
translated CDRs 1 and 2 (bracketed in Fig. 2(a)) 
and these may correspond to different alleles. Thus, 
the following V H 1 segments differ by one nucleotide: 
V I2 , DP-8 and 1-1; DP-21 and V,_ 4 lb ; DP-14 and 
VH1GRR; 21-2/3-1/DP-7 and HG3; 7-2 and DP-4. 
The following V H 3 segments differ by one to six 
nucleotides: VHD26 and DP-30; DP-42 and 8-1B; 
65-2/DP-44 and DP-45; fl-pl and DP-61; hv3005, 
hv3005f3 and GL-SJ2/DP-46. The following V H 4 
segments differ by one or two nucleotides: DP- 67 
and V H SP /VH-JA/V H 4.22; V 79 /V H 4.19/V Iv ^ b and 
DP-70; V H 4.18 and V 2 . r ; V H 4.11/DP-71, V 71 _ 4 and 
V H 4.16. The following V H 5 segments differ by one 
or two nucleotides: VH251/DP-73, V„VJB and 
V H VCW; VH32 and V H VRG/V„VMW. Of course, 
other V H segments, for example, DP- 10 and hvI263, 
and V]. 3b /DP-25 and V h3 may also be alleles, but 
they encode differences in the CDRs and have there- 
fore been grouped separately. This is consistent with 
the suggestion that even diverse V H segments' (y n _ 5 
and V n . 5b ; V IV _ 4 and V iv . 4b ) could be alleles (Shin et 
al., 1991). 

Hence, we find a "core" of V H segments with open 
reading frames that are highly conserved in the 
antigen binding regions and differ by only a small 
number of nucleotides in the framework regions. 
This limited sequence polymorphism between 
unrelated individuals together with the insertion/ 
deletion polymorphism agrees with the suggestion 
that the germline V H repertoire, is derived from a 
population of diverse haplotypes with , a small 
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number of alleles at each locus (Sasso et al., 1990; 
van Dijk et ah, 1991). 

- In J contrast to the limited sequence polymorphism 
in.V H segments with open reading frames, only five 
pseudogenes amplified from DP are identical to V H 
segments seen in unrelated individuals and a further 
five' pairs differ by one or two nucleotides. The 
finding ,that certain pseudogenes are identical, or are 
very similar, in r unrelated individuals (see Fig. 2(c)) 
has been previously noted (Kodaira et al, 1986) and 
may indicate a physiological role for them, possibly 
as donors for gene conversion, as in the chicken 
(Reynaud et al, 1989).. 

(c) Assignment of rearranged genes 

As shown 1 in Figure 3(b), the majority of 
rearranged genes, usually derived from mRNA, are 
very closely related, to their germline counterparts. 
This^ confirms that , these germline genes can be 
rearranged and transcribed and are probably trans- 
lated into protein. Some of the differences between 
the rearranged and germline genes could be due to 
germline polymorphism, but as this is limited (see 
above),, the majority are . probably caused by 
somatic mutation; In a few examples, the sequences 
of the rearranged V H genes appear to be a composite 
of two V H] segments (21 5B and 216G; Marks et al, 
19916), which presumably arose by PCR cross- over. 

The assignment of rearranged human V H genes to 
their germline counterparts may help in dissecting 
mechanisms of: the human immune system. It 
enables us to determine the relative usage of parti- 
cular V H segments (the possible underexpression of 
V H 1 segments arid overexpression of V H 4 segments) 
and the number and location of somatic mutations 
by which a particular antibody has been shaped. It 
also allows, us to differentiate between immune 
responses that utilize V H segments with different 
levels of somatic mutation. For example, it has been 
repeatedly suggested that foetal antibodies and 
autoantibodies are. dominated by rarely mutated or 
unmutated germline V H genes and that these anti- 
bodies are often polyreactive (see Chen et a/., 1990; 
Hills6n'& Perlmutter, 1990; Siminovitch & Chen, 
1990; Pascual > & Capra, 1991). 

Using r our database of human rearranged V H genes 
we find that 1 about three-quarters of the genes of 
foetal origin 'are germline at the level, of amino acid 
sequence and the rest have no more than five amino 
acid , changes; (see references marked If in Fig. 3 
legend). However, in the . case of autoantibodies 
(autoimmunity related V H genes, see references 
marked | in Fig. 3 legend) there is no clear differ- 
ence in the overall number of amino acid changes 
compared to rearranged V H genes found in normal 
peripheral blood lymphycytes (see references 
marked J in Fig. 3 legend). This does not support 
the concept that autoantibodies are mainly encoded 
by rarely mutated or unmutated V H genes and 
reflects the current uncertainty about the origin of 
autoantibodies and the role of antigen stimulation 
(Dersimonian et <aL, 1990). Other interesting 



features emerge for different B cell malignancies. 
Whereas most of the V H genes isolated from acute 
lymphoblastic leukaemia (ALL) patients are rarely 
mutated or unmutated (Berman et al., 1988; Carroll 
et aL, 1989; Deane & Norton, 1990), about half the 
V H genes isolated from patients with chronic 
lymphocytic leukaemia (CLL) contain more than six 
amino acid changes (see references marked § in 
Fig. 3 legend). Very highly mutated V H genes (17, 
20, 43 amino acid changes) have been detected in 
other B cell tumours, such as myelomas (White et 
al, 1990; Kenten et aZ.,'1982; Yasui et al, 1989). 

(d) Number of human V H segments 

Estimates of the number of human V H segments 
per individual have been based on restriction digests 
of genomic DNA probed for each V H family, but are 
likely to be underestimates (due to bands co- 
migrating on the gel). For example, Southern blot 
analyses of digested DNA from HeLa and LA-N-5 
cell lines yielded 60 to 80 hybridizing fragments 
(Berman et al, 1988) but the authors estimated the 
total number of V H segments to be between 100 and 
200. More recently, two-dimensional pulse field gel 
electrophoresis of digested homozygous DNA 
(Walter et al, 1990) suggested a total of 76 V H 
segments (25 V H 1 segments, 5 V H 2 segments, 28 V H 3 
segments, 14 V H 4 segments, 3 V H 5 segments and 1 
V H 6 segment). 

We have cloned and sequenced 74 human V H 
segments (25 V H 1 segments, 3 V H 2 segments, 34 V H 3 
segments, 10 V H 4 segments, 1 V H 5 segment and 1 
V H 6 segment). Fifty -one of these have open reading 
frames, and 23 contain either frame shifts or stop 
codons and are therefore considered to be pseudo- 
genes. While the number of pseudogenes amplified 
from DP is likely to be an underestimate due to 
primer bias, the number of V H segments with open 
reading frames (51) seems to correspond to the 
coding repertoire of an individual. Indeed, 215 of 
292 rearranged V H genes from different (non-DP) 
individuals have germline counterparts seen in DP. 
The extent to which our individual is representative 
of the human population as a whole depends on the 
exact nature of polymorphism within the V H locus. 
To determine this, we need a physical map of the V H 
segments from individuals with different genetic 
backgrounds, in which individual V H loci have been 
sequenced. This would tell us the number of 
different sequences in the human V H segment pool, 
the total number of loci and the number of alleles at 
each locus. 

(e) Structural diversity of human germline 
V H segments 

In order to focus on the structural diversity of 
antigen binding sites implicit in the germline V H 
repertoire of the human population, we grouped 
together (bracketed in Fig. 2(a)) those V H segments 
that encode identical CDRs 1 and 2. We have 
selected those V H segments with rearranged counter- 
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parts (marked R in Fig. 2(a)) and excluded a few V H 
segments (marked X in Fig. 2(a)), which appear to 
be defective on structural grounds (Chothia et al., 
1992) and therefore are unlikely to contribute to the 
functional V H repertoire. 

This suggests that the structural diversity 
encoded by human germline V H segments is deter- 
mined by a minimum of 43 groups of rearranged V H 
segments, each encoding identical CDR loops. This 
figure is likely to increase as rearranged counter- 
parts of other germline" V H segments in Figure 2(a) 
are discovered and as a few additional germline 
segments are determined from different individuals. 
However, those V H segments with heptamers other 
than the 5'-CACAGTG-3' motif (marked H in 
Fig. 2(a)) and those with nonamers that differ from 
the family consensus (marked N in Fig. 2(a)) may 
be unable to recombine and hence not be expressed. 

In order to determine the possible extent of 
sequence diversity, our variability plots (Fig. 4) are 
calculated using sequence data from all germline V H 
segments of the 43 structural groups and those 
germline sequences for which no rearranged 
counterparts have yet been discovered. The use of 
germline V H segments eliminates the effects of 
somatic mutation and sampling bias present in vari- 
ability plots of rearranged V H genes (Kabat et al., 
1991). 

The plots are consistent with the classification of 
framework (FR) and complementarity-determining 
regions (CDR) defined by Kabat et al. (1991), but 
new features do emerge. Firstly, variability is higher 
in CDR2 than in CDR1. Secondly, the hyper- 
variable region of CDR2 only comprises residues 50 
to 58, rather than 50 to 65, with the last seven 
residues of CDR2 (59 to 65) being highly conserved 
within each of the six families. Thirdly, in addition 
to CDRL and CDR2, we find two regions of 
unusually high variability across all six families. 
One of them is residue 16 and the other is centred 
around residue 73 and corresponds to a loop 
adjacent to CDR2. The region in framework 3 is 
particularly variable in the V H 1 family and may 
function by altering the conformation of CDR2 for 
antigen binding, or make additional contacts 
directly with the antigen (like in the case of the light 
chain FR3 in the D1.3/E255 complex: Bentley et al., 
1990). Alternatively, it may interact with an 
unidentified ligand involved in the biology of the B 
cell response, for example, a superantigen 
(Schroeder et al., 1990; Sasso et al, 1991). 

(f) Conclusion 

Our strategy has enabled us to determine the 
human germline V H segments with open reading 
frames from a single individual (DP). The compari- 
son with germline* V H segments from other indivi- 
duals and with 292 rearranged V H genes suggests 
that sequence polymorphism is limited, and that the 
directory could be used to map the V H locus in 
different individuals, to determine the usage of 
specific V H segments in immune responses and to 



detect somatic mutation or gene conversion events 
in vivo. 1 ■: "> .*-■ 

The directory indicates that the structural diver- 
sity of the germline repertoire for antigen binding is 
fixed by about 50 groups of V H segments. , Each 
group encodes identical hypervariable loops and has 
been seen as a rearranged gene. The limited diver- 
sity encoded by germline V H segments emphasizes 
the importance of the additional diversity provided 
by the D and J H segments and by somatic mutation. 
It suggests that our repertoire of V H segments from 
DP should be sufficient for building libraries of 
human antibodies with known components (Winter 
& Milstein, 1991; Marks et al, 1991a). 
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Note added in proof. Since submission of this paper, we have amplified and cloned six additional V H 
segments from DP (DP-75 to DP-80). EMBL Data Library accession numbers for DP-1 to DP-80; ' 
Z12303-37, Z12602-3, Z12338-74 and Z14071-6. 
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